This dataset is a collection of scraped public twitter updates used in coordination with an academic project to study the geolocation data related to twittering. From the explanatory PDF in the dataset collection:
We provide both training set and test set (collected from September 2009 to January 2010) in the paper You Are Where You Tweet: A Content-Based Approach to Geo-locating Twitter Users in CIKM 2010. The training set contains 115,886 Twitter users and 3,844,612 updates from the users. All the locations of the users are self-labeled in United States in city-level granularity. The test set contains 5,136 Twitter users and 5,156,047 tweets from the users. All the locations of users are uploaded from their smart phones with the form of "UT: Latitude,Longitude".
Please cite the following paper when using the dataset. Z. Cheng, J. Caverlee, and K. Lee. You Are Where You Tweet: A Content-Based Approach to Geo-locating Twitter Users. In Proceeding of the 19th ACM Conference on Information and Knowledge Management (CIKM), Toronto, Oct 2010. (Bibtex)